The Annals of Applied Statistics 
2012, Vol. 6, No. 3, 1209-1235 
DOI: 10.1214/11-AOAS532 

@ Institute of Mathematical Statistics, 2012 



NETWORK INFERENCE AND BIOLOGICAL DYNAMICS^ 



By Chris. J. Oates and Sack Mukherjee 

University of Warwick and Netherlands Cancer Institute, 
and Netherlands Cancer Institute and University of Warwick 

Network inference approaches are now widely used in biological ap- 
plications to probe regulatory relationships between molecular com- 
ponents such as genes or proteins. Many methods have been proposed 
for this setting, but the connections and differences between their 
statistical formulations have received less attention. In this paper, 
we show how a broad class of statistical network inference methods, 
including a number of existing approaches, can be described in terms 
of variable selection for the linear model. This reveals some subtle but 
important differences between the methods, including the treatment 
of time intervals in discretely observed data. In developing a gen- 
eral formulation, we also explore the relationship between single-cell 
stochastic dynamics and network inference on averages over cells. 
This clarifies the link between biochemical networks as they operate 
at the cellular level and network inference as carried out on data that 
are averages over populations of cells. We present empirical results, 
comparing thirty-two network inference methods that are instances of 
the general formulation we describe, using two published dynamical 
models. Our investigation sheds light on the applicability and limi- 
tations of network inference and provides guidance for practitioners 
and suggestions for experimental design. 

1. Introduction. Networks of molecular components such as genes, pro- 
teins and metabolites play a prominent role in molecular biology. A graph 
G = {V,E) can be used to describe a biological network, with the ver- 
tices V identified with molecular components and the edges E with reg- 
ulatory relationships between them. For example, in a gene regulatory net- 
work [Babu et al. (2004); Davidson (2001)], nodes represent genes and edges 
transcriptional regulation, while in a protein signaling network [Yarden and 
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Sliwkowski (2001)], nodes represent proteins and edges may represent the 
enzymatic influence of tlie parent on the biochemical state of the child, for 
example, via phosphorylation. In many biological contexts, including disease 
states, the edge structure of the network may itself be uncertain (e.g., due 
to genetic or epigenetic alterations). Then, an important biological goal is 
to characterize the edge structure (often referred to as the "topology" of 
the network) in a context-specific manner, that is, using data acquired in 
the biological context of interest (e.g., a type of cancer, or a developmental 
state). Advances in high-throughput data acquisition have led to much in- 
terest in such data-driven characterization of biological networks. Statistical 
approaches play an increasingly important role in these "network inference" 
efforts. From a statistical perspective, the goal can be viewed as making in- 
ference regarding the edge structure E in light of biochemical data y. Since 
aspects of biological dynamics may not be identifiable at steady-state, time- 
varying data is usually preferred, and this is the setting we focus on here. In 
many applications the data y arise from "global perturbation" of the cellu- 
lar system, for example, by varying culture conditions or stimuli. The extent 
to which networks can be characterized using global perturbations remains 
poorly understood, since it is likely that such data expose only a subspace 
of the phase space associated with cellular dynamics. 

The importance of network inference in diverse biological applications, 
from basic biology to diseases such as cancer, has spurred vigorous activity 
in this area. Many specific methods have been proposed, in the statistical 
literature as well as in bioinformatics and bioengineering, with some popular 
approaches reviewed in Bansal, Belcastro and Ambesi-Impiombato (2007); 
Bonneau (2008); Hecker et al. (2009); Lee and Tzou (2009); Markowetz and 
Spang (2007). Graphical models play a prominent role in this literature, as 
does variable selection. A distinction is often made between statistical and 
"mechanistic" approaches [Ideker and Lauffenburger (2003)]. The former is 
usually used to refer to models that are built on conventional regression for- 
mulations and variants thereof, while the latter usually refers to models that 
are explicitly rooted in chemical kinetics, for example, systems of coupled 
ordinary differential equations (ODEs). This distinction is somewhat artifi- 
cial, since it is possible in principle to carry out formal statistical network 
inference based on mechanistic models (e.g., systems of ODEs), although 
this remains challenging [Xu et al. (2010)]. 

Many network inference schemes are based on formulations that are closely 
related in terms of the underlying statistical model. For example, vector au- 
toregressive (VAR) models [including Granger causality-related approaches 
as special cases; Bolstad, Van Veen and Nowak (2011); Meinshausen and 
Biihlmann (2006); Morrissey et al. (2010); Opgen-Rhein and Strimmer (2007); 
Zou and Feng (2009)], linear dynamic Bayesian networks [DBNs; Kim, Imoto 
and Miyano (2003)] and certain ODE-based approaches [Bansal and di Ber- 
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nardo (2007); Li and Chen (2010); Nam, Yoon and Kim (2007)] are inti- 
mately related, being based on linear regression, but with potentially dif- 
fering approaches to variable selection. In recent years, several empirical 
comparisons of competing network inference schemes have emerged, in- 
cluding Altay and Emmert-Streib (2010); Bansal, Belcastro and Ambesi- 
Impiombato (2007); Hache, Lehrach and Herwig (2009); Smith, Jarvis and 
Hartemink (2002); Werhli, Grzegorczyk and Husmeier (2006). Assessment 
methodology has received attention, including attempts to automate the 
generation of large scale biological network models for automatic bench- 
marking of performance [Marbach et al. (2009); Van den Bulcke et al. 
(2006)]. In particular, the Dialogue for Reverse Engineering Assessments 
and Methods (DREAM) challenges [Prill et al. (2010)] have provided an 
opportunity for objective empirical assessment of competing approaches. At 
the same time, developments in synthetic biology have led to the availability 
of gold standard data from hand-crafted biological systems, such that the 
underlying network is known by design [Camacho and Collins (2009); Can- 
tone et al. (2009); Minty, Varedi and Nina (2009)]. However, relatively little 
attention has been paid to the (sometimes contrasting) assumptions of the 
statistical formulations underlying these network inference schemes. 

Inferential limitations due to estimator bias and nonidentifiability remain 
incompletely understood. It is clear that chemical reaction networks (CRNs; 
these are graphs that give detailed descriptions of individual reactions com- 
prising the overall system) underlying biological networks are not in general 
identifiable [Craciun and Pantea (2008)]. Indeed, there exist topologically 
distinct CRNs which produce identical dynamics under mass-action kinet- 
ics. Moreover, even when the true network structure is known, reaction rates 
themselves may be nonidentifiable. However, mainstream descriptions of bio- 
logical networks, for example, gene regulatory or protein signaling networks, 
are coarser than CRNs. Such networks are useful because they are closely 
tied to validation experiments in which interventions (e.g., RNA interfer- 
ence or inhibitors) target network vertices. For example, inference of an 
edge in a gene regulatory network corresponds to the qualitative prediction 
that intervention on the parent will influence the child (via transcription 
factor activity). It remains unclear to what extent such biological network 
structure can be usefully identified from various kinds of data. On the other 
hand, Wilkinson (2006); Wilkinson (2009) discusses a number of general 
issues relating to stochastic modeling for systems biology, but does not dis- 
cuss network inference per se in detail. This paper complements existing 
empirical work by focusing on statistical issues associated with linear mod- 
els commonly used in network inference applications. 

Network inference methods can be viewed as generating hypotheses about 
cell biology. Yet the link between biochemical networks at the cellular level 
and network inference as applied to bulk or aggregate data (i.e., data that 
are averages over large numbers of cells) from assays such as microarrays 
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remains unclear. In applications to noisy time- varying data there is uncer- 
tainty in the predictor variables of the same order of magnitude as uncer- 
tainty in the responses, yet often only the latter is explicitly accounted for. 
Moreover, the treatment of time intervals in discretely observed data re- 
mains unclear, with contradictory approaches appearing in the literature. 
Most high-throughput assays, including array based technologies (e.g., gene 
expression or protein arrays), as well as single-cell approaches (e.g., FACS- 
based) involve destructive sampling, that is, cells are destroyed to obtain 
the molecular measurements. The impact of the resulting nonlongitudinal- 
ity upon inference does not appear to have been investigated. 

The contributions of this paper are threefold. First, we explore the connec- 
tion between biological networks at the cellular level and the linear statisti- 
cal models that are widely used for inference. Starting from a description of 
stochastic dynamics at the single-cell level, we describe a general statistical 
approach rooted in the linear model. This makes explicit the assumptions 
that underlie a broad class of network inference approaches. This also clar- 
ifies the relationship between "statistical" and "mechanistic" approaches to 
biological networks. Second, we explore how a number of published network 
inference approaches can be recovered as special cases of the model we arrive 
at. This sheds light on the differences between them, including how different 
assumptions lead to quite different treatments of the time step. Third, we 
present an empirical study comparing 32 different approaches that are spe- 
cial cases of the general model we describe. To do so, we simulate stochastic 
dynamics at the single-cell level from known networks, under global pertur- 
bation of two published dynamical models. This enables a clear assessment 
of the network inference methods in terms of estimation bias and consis- 
tency, since the true data-generating network is known. Furthermore, the 
simulation accounts for both averaging over cells, nonlongitudinality due 
to destructive sampling and the fact that only a subspace of the dynamical 
phase space is explored. Using this approach, we investigate a number of data 
regimes, including both even and uneven sampling, longitudinal and non- 
longitudinal data and the large sample, low noise limit. We find that the net 
effect of predictor uncertainty, nonlongitudinality and limited exploration 
of the dynamical phase space is such that certain network estimators fail 
to converge to the data-generating network even in the limits of large data 
sets and low noise. However, we point to a simple formulation which might 
represent a default choice, delivering promising performance in a number of 
regimes. 

An implication of our analysis is that uneven time steps may pose infer- 
ential problems, even when using models that apparently handle the sam- 
pling intervals explicitly. We therefore investigate this case by carrying out 
network inference on unevenly sampled data using a variety of statistical 
models. We find that the ability to reconstruct the data-generating network 
is much reduced in all cases, with some approaches faring better than others. 
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Since biological data are often unevenly resolved in time, this observation 
has important implications for experimental design. 

The remainder of this paper is organized as follows. We begin in Section 2 
with a description of stochastic dynamics in single cells and show how a series 
of assumptions allow us to arrive at a statistical framework rooted in the 
linear model. Section 3 contains an empirical comparison of several inference 
schemes, addressing questions of performance and consistency in a number 
of regimes. In Section 4 we discuss our results and point to several specific 
areas for future work. 

2. Methods. The cellular dynamics that underlie network inference are 
subject to stochastic effects [Elowitz, Levine and Siggia (2002); Kou, Xie and 
Liu (2005); McAdams and Arkin (1997); Paulsson (2005); Swain, Elowitz 
and Siggia (2002)]. We therefore begin our description of the data-generating 
process at the level of single cells and then discuss the relationship to aggre- 
gate data of the kind acquired in high-throughput biochemical assays. We 
then develop a general statistical approach, rooted in the linear model, for 
data from such a system observed discretely in time. We discuss inference 
and show how a number of existing approaches can be recovered as special 
cases of the general model we describe. Our exposition clarifies a number 
of technical but important distinctions between published methodologies, 
which until now have received little attention. 

2.1. Data- generating process. 

2.1.1. Stochastic dynamics in single cells. Let X = {Xi, . . . ,Xp) G X de- 
note a state vector describing the abundance of molecular quantities of inter- 
est, on a space X chosen according to physical and statistical considerations. 
The components of the state vector (e.g., mRNA, protein or metabolite lev- 
els) are identified with the vertices of the graph G that describes the biolog- 
ical network of interest. In this paper the "expression levels" X(t) of a single 
cell at time t are modeled as continuous random variables that we assume 
satisfy a time-homogenous stochastic delay differential equation (SDDE) 



where f , g are drift and diffusion functions respectively, J^x(i) = {^(•s) ■ s <t} 
is the natural filtration (the history of the state vector X) and B denotes 
a standard Brownian motion. A continuous state space X is appropriate as 
a modeling assumption only if the copy numbers of all molecular components 
are sufficiently high. This is thought to be the case for the biological systems 
considered in this paper, but in general the stochasticity due to low copy 
number will need to be encoded into inference [Paulsson (2005)]. The edge 
structure E of the biological network G is defined by the drift function f , 
such that (i, j) € E <J=^ fji.^) depends on Xj. 
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We further assume that the functions f , g are sufficiently regular and de- 
pend only on recent history J-^{[t — T,t]). For example, in the context of 
gene regulation r might be the time required for one cycle of transcription, 
translation and binding of a transcription factor to its target site, the char- 
acteristic time scale for gene regulation. This is a finite memory requirement 
and can be considered a generalization of the Markov property. Equivalently, 
this property codifies the modeling assumption that the observed processes 
are sufficient to explain their own dynamics, that there are no latent vari- 
ables. It is common practice to take r = 0, in which case the process defined 
by equation (1) is Markovian. This stochastic dynamical system with phase 
space {(f(J^x))X) : X G X} forms the basis of the following exposition. 

2.1.2. Aggregate data. A variety of experimental techniques, including, 
notably, microarrays and related assays, capture average expression levels 
j^{N) ._ ^^^X^/A^ over cells, where X'^ denotes the expression levels in 
cell k. This paper does not consider effects due to intercellular signaling, 
which are typically assumed to be negligible. Then averaging sacrifices the 
finite memory property (a generalization of the fact that the sum of two 
independent Markov processes is not itself Markovian) . However, it is usually 
possible to construct a finite memory approximation of the form 

(2) dX(^) = f (7-x(iv) ) dt + g(^) (7-x(iv, ) dB(^) 

using a so-called "system size expansion" [Van Kampen (2007)]. Approxi- 
mations of this kind derive from a coarsening of the underlying state space, 
assuming that the new state vector X^^-* captures every quantity relevant 
to the dynamics. The statistical models discussed in this paper rely upon 
coarsening assumptions in order to control the dimensionality of state space. 

Using the mild regularity conditions upon cellular stochasticity g, the 
laws of large numbers give that in the large sample limit the sample aver- 
age X°° := limAT^ooX^^) = E(X) equals the expected state of a single cell 
(almost surely). We note that the relationship between the single-cell dy- 
namics as it appears in equation (1) and this deterministic limit may be 
complicated, since in general E(f(J"x)) 7^ f(-^E{X))- However, for linear f, 
say, for simplicity, f = f(X) = AX, we have 

TV N 

k=l k=l 
N N 

= Nl2^^'dt + -Y,giT^.)dB'' 

, , k=\ k=\ 
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= AX(^) dt + R(^) = f ( Jxiiv) ) dt + R(^) , 

where R*^^) := g(-7Sc'^) '^^'^/^ ~^ almost surely as ^ oo, and so 
d^°°/dt = f(-Fx°°)- In other words, the average over large numbers of cells 
shares the same drift function as the single cell, so that inference based on av- 
eraged data applies directly to single-cell dynamics. Otherwise this may not 
hold, that is, dX°^/dt = dE{'X)/dt = E(f(J-x)) 7^ f (-7te(x)) = f(-7^x-)- This 
has implications when using nonlinear forms, such as Michaelis-Menten or 
Hill kinetics, to describe the behavior of a large sample average; these non- 
linear functions are derived from single-cell biochemistry and may not apply 
equally to the large sample average X°°. The error entailed by commuting 
drift and expectation may be assessed using the multivariate Feynman-Kac 
formula for = E(X) [0ksendal (1998)]. 

In practice, the observation process may be complex and indirect, for 
example, measurements of gene expression may be relative to a "house- 
keeping" gene, assumed to maintain constant expression over the course of 
the experiment. Moreover, the details of the error structure will depend 
crucially on the technology used to obtain the data. To limit scope, this 
article assumes the averaged expression levels X°°(t) are observed at dis- 
crete times t = tj (0 < j < n) with additive zero-mean measurement error as 
Y(tj) = X°°(tj) +Wj, where the Wj are independent, identically distributed 
uncorrelated Gaussian random variables. 

2.2. Discrete time models. Network inference is usually carried out using 
coarse-grained models [equation (2)] that are simpler and more amenable to 
inference than the process described by equation (1). Here, informed by the 
foregoing treatment of cellular dynamics, we develop a simple network infer- 
ence model for data observed discretely in time. We clarify the assumptions 
of the statistical model, and show how several published approaches can be 
recovered as special cases. 

2.2.1. Approximate discrete time likelihood. Network inference entails 
statistical comparison of networks G € G, where Q denotes the space of 
candidate networks. The space Q may be large (naively, there are 2^^^ pos- 
sible networks on P vertices), although biological knowledge may provide 
constraints. Network comparisons require computation of a model selection 
score for each network, that is, considered, which in turn entails use of the 
likelihood (e.g., maximization of information criteria, or integration over the 
likelihood in the Bayesian setting). Therefore, exploration over large model 
spaces is often only feasible given a closed-form expression for the likelihood 
(or preferably for the model score itself). 
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However, the likelihood for a SDDE model [equation (2)] is not gener- 
ally available in closed form. There has been recent research into computa- 
tionally efficient approximate likelihoods for fully observed, noiseless diffu- 
sions [Hurn, Jeisman and Lindsay (2007)], but it remains the case that the 
most efficient (though least accurate) closed-form approximate likelihood is 
based on the Euler-Maruyama discretization scheme for stochastic differen- 
tial equations (SDKs), which in the more general SDDE case may be written 
as (henceforth dropping the superscript N) 

(4) X(t,) « X(t,_i) + A,f(Jx(ij-i)) + g{Tx{tj-i))AB,, 

where ABj ~ A^(0, Ajl) and Aj = tj — tj-i is the sampling time interval. 
Incorporating measurement error into this so-called Riemann-Ito likelihood 
[Dargatz (2010)] requires an integral over the hidden states X which would 
destroy the closed-form approximation. Therefore, the observed, nonlongi- 
tudinal data y are directly substituted for the latent states X, yielding the 
(triply) approximate likelihood 

n 

(5) Mij)=y(f,_i) + A/(Jy(f,„i)), 

S(t,)=A,g(jy(t,_l))g(jy(t,_l))'. 

Here A^(»;/z,S) denotes a Normal density with mean fi and covariance S. 
Implicit here is that the functions f , g depend on J^y only through time lags 
which coincide with the measurement times tj-i- 

Thus, C may be obtained from a state-space approximation to the original 
SDDE model [equation (2)]. Despite reported weaknesses with the Riemann- 
Ito likelihood [Dargatz (2010); Hurn, Jeisman and Lindsay (2007)] and the 
poorly characterized error incurred by plugging in nonlongitudinal observa- 
tions, this form of approximate likelihood is widely used to facilitate network 
inference [equations (5) and (6) correspond to a Gaussian DBN for the ob- 
servations y, generalized to allow dependence on history]. This is due both 
to the possibility of parameter orthogonality, allowing inference to be per- 
formed for each network node separately, and the possibility of conjugacy, 
leading to a closed-form marginal likelihood 7r(y|G). 

2.2.2. Linear dynamics. Kinetic models have been described for many 
cellular processes [Cantone et al. (2009); Schoeberl et al. (2002); Swat, 
Kel and Herzel (2004); Wilkinson (2009)]. However, statistical inference for 
these often nonlinear models may be challenging [Bonneau (2008); Wilkin- 
son (2006); Wilkinson (2009); Xu et al. (2010)]. Moreover, there is no guar- 
antee that conclusions drawn from cellular averages will apply to single 
cells, because, as noted above, the deterministic behavior seen in averages 
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may not coincide with the single-cell drift. However, linear dynamics satisfy 
E(f(J"x)) = f(-^E(X)) exactly, so that conclusions drawn from verages ap- 
ply directly to single cells. For notational simplicity consider the Markovian 
T = regime. A Taylor approximation of the cellular drift f about the origin 
gives 

(6) f(X)«f(0)+Z?f|x=oX, 

where Df is the Jacobian matrix of f . The constant term can be omitted 
(f(0) = 0), since absent any regulators there is no change in expression. 
Then, the Jacobian Df captures the dynamics approximately under a lin- 
ear model. Furthermore, the absence of an edge in the network G implies 
a zero entry in the Jacobian, that is, (i, j) ^ E ^ {Di)ji = 0. Obtaining the 
Jacobean at x = therefore does not imply complete knowledge of the edge 
structure E. We note that the general SDDE case is similar but with addi- 
tional differentiation required for the additional dependencies of f . Hence- 
forth, we write equations for the simpler Markovian model, although they 
hold more generally. 

One may ask whether the restriction to linear drift functions allows the 
computational difficulties associated with inference for continuous time mod- 
els to be avoided, since in the Markovian (r = 0) case both the SDE [equa- 
tion (1)] and limiting ordinary differential equation (ODE) have exact closed 
form solutions. In the ODE case, for example, X(t) = exp(At)Xo and under 
Gaussian measurement error the likelihood has a closed form as products 
of terms AA(y(tj); exp(Atj)Xo, M), where the parameters 9 = (A,Xo,M) 
include the model parameters A, initial state vector Xq and the measure- 
ment error covariance M. Unfortunately, evaluation of the matrix exponen- 
tial is computationally demanding and inference for the entries of A must 
be performed jointly since, in general, exp(A) does not factorize usefully. 
It therefore remains the case that inference for continuous time models is 
computationally burdensome, even when the models are linear. 

2.2.3. The dynamical system as a regression model. The Jacobian Di 
with entries {Dt)ij = dfi/dxj\y^=Q is now the focus of inference. We can 
identify the Jacobian with the unknown parameters in a linear regression 
problem by modeling the expression of gene p using 



(7) 



'dXp{ti)- 






•• Xp{to) - 




'iDf)p,i- 


_dXp{tn) . 




.Xi{tn-l) ■ 


■■ Xp{tn-l)_ 




.iDf)p,P. 



where the gradients dXp{tj) are approximated by finite differences, in this 
case {Xp{tj) — Xp{tj-\))/ Our notation for finite differences should not 
be confused with the differentials of stochastic calculus. More generally, for 
processes with memory, the matrix may be augmented with columns corre- 
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spending to lagged state vectors and the vector (L)f )p^, augmented with the 
corresponding derivatives of the drift function f with respect to these lagged 
states. To avoid confusion, we write A for Df when discussing parameters, 
since the drift f is unknown. Similarly, design matrices will be denoted by B 
to suppress the dependence on the random variables X. So equation (7) may 
be written compactly as 

(8) dXp^B^;.. 

Inference for the parameters Ap^, may be performed independently for each 
variable p. While equation (8) is fundamental for inference, one can equiva- 
lently consider the dynamically intuitive expression 

(9) dX{tj) ^ AB'j ,. 

An interesting issue arises from the dual interpretation of the regression 
model as a dynamical system [equation (9)], because there are natural re- 
strictions on A to avoid the solution tending to infinity. For instance, if the 
sampling interval A is constant, then we require M(A) < for each eigen- 
value A of A-|- AI. The inference schemes which we discuss do not account for 
this, because the condition forces a nontrivial coupling between rows 
jeopardizing parameter orthogonality. 

Finally, the generative model is specified by substituting noisy, nonlongi- 
tudinal observables Y for latent variables X into equation (9) and stating 
the dependence of the approximation error on the sampling interval A-,-. 
Under uncor related Gaussian measurement error we arrive at a model 

(10) dY{tj) ~ N{AB'^^„ h{Aj)V{al 4)), 

where h : IR+ — )> M"^ is a variance function that must be specified and V^v) 
represents the diagonal matrix induced by the vector v. 

There are a number of ways in which this regression is nonstandard. For 
example, the substitution of (nonlongitudinal) observations for latent vari- 
ables is clearly unsatisfactory because the linear regression framework does 
not explicitly allow for uncertainty in the predictor variables B. It is unclear 
whether this introduces bias or leads to an overestimate of the significance 
of results. Moreover, it is unclear how to choose the variance function h, 
since the Euler-Maruyama approximation [equation (4)] is only valid for 
small sampling intervals Aj, but in this regime the responses dY{tj) are 
dominated by measurement error, such that the data may carry little infor- 
mation. These issues are investigated in Sections 3 and 4 below. 

2.3. A unifying framework. Equation (10) describes a class of models 
with specific instances characterized by choice of design matrix B and vari- 
ance function h. Since any such model corresponds to the linear regression 
equation (7), the task of determining the edge structure of the network, or, 
equivalently, the location of nonzero entries in the Jacobian A, can be cast 
as a variable selection problem. 
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Table 1 

A nonexhaustive list of network inference schemes rooted in the linear model. 
The examples from literature demonstrate the statistical features indicated, 
but may differ in some aspects of implementation. The symbol denotes 
the VAR(g) model which lacks a variance function 





Variance 






Design 


function 






matrix B 


h(A) oc 


Variable selection 


Example 


Standard 




Ridge regression 


Bansal and di Bernardo (2007) "TSNIB" 


Standard with 





Group LASSO 


Bolstad, Van Veen and Nowak (2011) 


lagged predictors 








Quadratic 





Conjugate Bayesian 
with network prior 


Hill, Lu and Mohna (2011) 


Standard 





Information criteria 


Kim, Imoto and Miyano (2003) 


Nonlinear (Hill) 


1 


AIC with backstcpping 


Li and Chen (2010) 


basis functions 








Standard 


1 


Conditional independence 
tests 


Li et al. (2011) "DELDBN" 


Standard 





Semi-conjugate Bayesian 


Morrissey et al. (2010) 


Standard 




SVD and pscudoinversc 


Nam, Yoon and Kim (2007) "LEARNe" 


Standard 





Multi-stage analytic 
shrinkage approach 


Opgen-Rhein and Strimmer (2007) 


Standard and 





Granger causality 


Zou and Feng (2009) 



nonlinear with 



lagged predictors 



A number of specific network inference schemes can now be recovered by 
fixing the design matrix and variance function and coupling the resulting 
model with a variable selection technique. A selection of published network 
inference schemes that can be viewed in this way is presented in Table 1. 
One might see these schemes classed as VAR models [Bolstad, Van Veen and 
Nowak (2011); Morrissey et al. (2010); Opgen-Rhein and Strimmer (2007); 
Zou and Feng (2009)], DBNs [Hill, Lu and Molina (2011); Kim, Imoto and 
Miyano (2003)] or ODE-based approaches [Bansal and di Bernardo (2007); 
Li and Chen (2010); Nam, Yoon and Kim (2007)], although as we have 
demonstrated this classification disguises their shared foundation in the lin- 
ear model. 

As shown in Table 1, the variance functions h, and therefore sampling 
intervals Aj, are not treated in a consistent way in the literature. In the 
special case of even sampling times Aj = A, a model is characterized only 
by its design matrix. If the standard design matrix is used, then the entire 
family of models 

(11) Yfe) - Yfe-i) ^ Ar(AY(t,„i), /^(A)P(a?, . . . , 4)) 
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Y(t,)~iV(AY(t,_i),P(a?,...,4)) 



where A = A A + 1 and ct^ = A^^(A)crp. More generally, the VAR(g') model 
is prevalent in the literature (see Table 1), yet it does not explicitly han- 
dle uneven sampling intervals. This is a potentially important issue since 
uneven sampling is commonplace in global perturbation experiments, with 
high frequency sampling used to capture short term cellular response and 
low frequency sampling to capture the approach to equilibrium. We discuss 
the importance of modeling using a variance function, and whether a natural 
choice for such a function exists in Section 4 below. In addition, we explored 
whether inference may be improved through the use of either nonlinear ba- 
sis functions or lagged predictors to capture respectively nonlinearity and 
memory in the underlying drift function is unclear. Section 3 presents an 
empirical investigation of these issues. 

2.4. Inference. An appealing feature of the discrete time model is that 
parameters corresponding to different variables are orthogonal in the Fisher 



As a consequence, network inference over G may be factorized into P inde- 
pendent variable selection problems. For definiteness we focus on just two 
approaches to variable selection, the Bayesian marginal likelihood and AIC, 
but note that many other approaches are available, including those listed 
in Table 1, and can be applied here in analogy to what follows. Below we 
assume the response vector dyph"^^"^ and the columns of the design matrix 
B/i~^/^ are standardized to have zero mean and unit variance, but for clarity 
subsume this into unaltered notation. 

2.4.1. Bayesian variable selection. For simplicity, the variance function 
is initially taken to be constant {h = l). We set up a Bayesian linear model 
conditional on a network G using Zellner's g-prior [Zellner (1986)], that is, 
with priors Ap^,\ap ~ N{0,apn{'B'p'Bp)~^) and vr(o"p) oc l/cTp where Bp is the 
design matrix B with nonpredictors removed according to G. We note that 
while the g-prior is a common choice, alternatives may offer some advantages 
[Deltell (2011); Friedman et al. (2000)]. 

Let mp be the number of predictors for variable p in the network G. 
Integrating the likelihood [induced by equation (10)] against the prior for 
{Ap^,,ap) produces the following closed- form marginal likelihood: 



sense: 



p 



(13) 




p=i 



(14) 
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where dy^ = Bp(BpBp) ^B^dyp. These formulae extend to arbitrary vari- 
ance functions h by substituting B i— )■ B/i^/^, dy i— ?> dyh}^"^ . Network in- 
ference may now be carried out by Bayesian model averaging, using the 
posterior probability of a directed edge from variable i to variable j: 

(15) P(. regulates j) = ^(yigMG) .|(.^^.) ^ j^^^^y 

In experiments below, we take a network prior which, for each variable p, 
is uniform over the number of predictors nip up to a maximum permissi- 



bly 



ble in-degree dmax, that is, ir{G) cx Yl^ ) I{mp < dmax}, but note that 
richer subjective network priors are available in the literature [Mukherjee 
and Speed (2008)]. Finally, a network estimator G is obtained by thresh- 
olding posterior edge probabilities: (z,j) G E{G) regulates j) > e. For 
small maximum in-degree dmaxj exact inference by enumeration of variable 
subsets may be possible. Otherwise, Markov chain Monte Carlo (MCMC) 
methods can be used to explore an effectively smaller model space [Ellis and 
Wong (2008); Friedman and Roller (2003)]. In the experiments below we use 
exact inference by enumeration. 

2.4.2. Variable selection by corrected AIC. Again, consider a constant 
variance function (/i = 1); rescaling as described above recovers the general 
case. The usual maximum likelihood estimates Ap^, = (BpBp)~^Bp(iyp and 

(Tp = ^ ^j{dyp{tj) — dyp{tj))'^ induce closed forms Cpa~"' for the maximized 
factors of the likelihood function, where Cp is a constant not depending 
on the choice of predictors. Corrected AIC scores [Burnham and Anderson 
(2002)] for each variable p are then 

(16) AIC,(p, G)=n log{al) + 2mp + ^""^^""^ + 



n — rrin — 1 



Again we consider all models with maximum permissible in-degree dmax- 
Lowest scoring models are chosen for each variable in turn, inducing a net- 
work estimator G. 



3. Results. In this section we present empirical results investigating the 
performance of a number of network inference schemes that are special cases 
of the general formulation described by equation (10). Objective assessment 
of network inference is challenging [Prill et al. (2010)], since for most bio- 
logical applications the true data-generating network is unknown. We there- 
fore exploit two published dynamical models of biological processes, namely, 
Cantone et al. (2009) and Swat, Kel and Herzel (2004), described in de- 
tail in the Supplemental Information [SI; Oates and Mukherjee (2011)]. The 
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first is a synthetic gene regulatory network built in the yeast Saccharomyces 
cerevisiae. These five gene network and associated delay differential equa- 
tions (DDEs) have received attention in computational biology [Camacho 
and Collins (2009); Minty, Varedi and Nina (2009)], and have been shown 
to agree with gold-standard data [at least under an E(f(Jx)) ~f(-^E(X)) 
assumption]. Cantone et al. consider two experimental conditions: "switch- 
on" and "switch-off." In this paper "switch-on" parameter values were used 
to generate data. The Swat model is a gene-protein network governing the 
Gi/S transition in mammalian cells. The model has a nine-dimensional state 
vector and, unlike Cantone, is Markovian. We note that this model has not 
been directly verified in the manner of Cantone but is based on a theoret- 
ical understanding of cell cycle dynamics. There is undoubtedly bias from 
this essentially arbitrary choice of dynamical systems, but a comprehensive 
sampling of the (vast) space of possible networks and dynamics is beyond 
the scope of this paper. 

3.1. Experimental procedure. 

3.1.1. Simulation. We consider global perturbation data by initializing 
the dynamical systems from out of equilibrium conditions. This is a com- 
mon setting for network inference approaches, but the limitations of infer- 
ence from such data remain incompletely understood. For each dynamical 
system f, trajectories X'^ of single-cell expression levels were obtained as 
solutions to the SDDE equation (1) with drift f and uncorrelated diffu- 
sion g(X) = cJccii^^(X) (representing multiplicative cellular noise). Trajecto- 
ries were obtained by numerically solving SDDEs with heterogeneous initial 
conditions using the Euler-Maruyama discretization scheme [equation (4)]. 
MATLAB R2010a code for all simulation experiments is available in the 
SI. To mimic destructive sampling and consequent nonlongitudinality, solu- 
tions were regenerated at each time point. We are interested in data that 
are averages over a large number of single-cell trajectories. However, the 
computational cost of solving N xn SDDEs to produce each data set is pro- 
hibitive. Therefore, only a smaller number A^* ^ A^ of cells were simulated 
and a larger sample A^ then obtained by bootstrapping, that is, resampling 
from the A^* trajectories with replacement. In practice, A^* should be taken 
sufficiently large such that a negligible change in experimental outcome re- 
sults from further increase in A^*. Initial conditions for single-cell trajectories 
varied with standard deviation <Tcg11- Finally, uncorrelated Gaussian noise of 
magnitude fJmcas was added to simulate a measurement process with addi- 
tive error. In the experiments presented below, A^ = 10,000, A^* = 30 and 
n = 20 time points are taken within the dynamically interesting range (0- 
280 minutes for Cantone and 0-100 minutes for Swat). Measurement er- 
ror and cellular noise are set to give signal-to-noise ratios (X)/crnicas ~ 10, 
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Fig. 1. Two published dynamical systems models of cellular processes were used to gen- 
erate data sets. Single-cell trajectories were generated from an SDDE model [equation (1)] 
and averaged under measurement noise and nonlongitudinality due to destructive sampling, 
(a,) Data generated from (a model due to) Cantone et al. (2009), describing a synthetic 
network built in yeast, (h ) Data generated from Swat, Kel and Herzel ( 2004 ), i theory- 
driven model of the Gi/ S transition in mammalian cells. 



(X) /cJccU ~ 10 [here (X) represents the average expression levels of the vari- 
ables X over ah generated trajectories]. Figure 1 shows typical data sets for 
the two dynamical systems. 



3.1.2. Inference schemes. The fohowing inference schemes were assessed: 

Variable selection 
Design matrix 
Lagged predictors 
Variance function h{A) oc A~ 



{ Bayesian, AICc } 
{ Standard, Quadratic } 
{ No, Yes } 
a = { 0, 1, 2 , } 



For the design matrix "quadratic" refers to the augmentation of the pre- 
dictor set by the pairwise products of predictors, the simplest nonlinear 
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basis functions. For the variance function the symbol is used to de- 
note the VAR(g) model, which formally lacks a variance function. "Lagged 
predictors = Yes" indicates augmentation of the predictor set with lagged 
observations (a lag of ~ 28 mins is used for Cantone and ~ 10 mins for 
Swat). There are heuristic justifications for each of the candidate variance 
functions. For example, the function with a = 2 appears for small Aj when 
an exact Euler approximation and additive measurement error are assumed 
[Bansal and di Bernardo (2007)], whereas a = 1 is reminiscent of the Euler- 
Maruyama discretization equation (4). 

3.1.3. Empirical assessment. The performance of each inference scheme 
is quantified by the area under the receiver operating characteristic (ROC) 
curve (AUR), averaged over 20 data sets [Fawcett (2005)]. This metric, 
equivalent to the probability that a randomly chosen true edge is preferred 
by the inference scheme to a randomly chosen false edge, summarizes, across 
a range of thresholds, the ability to select edges in the true data-generating 
graph. Results presented below use a computationally favorable in-degree 
restriction dmax = 2. In order to check robustness to dmax, all experiments 
were repeated using dmax = 3, with no substantial changes in observed out- 
come (S Figure 6). 

3.2. Empirical results. 

3.2.1. Even sampling interval. Figure 2(a) displays box-plots over AUR 
scores for the Cantone dynamical system under even sampling intervals. 
Note that under even sampling, for an otherwise identical scheme, changing 
variance function does not affect the model, leading to identical AUR scores 
for schemes which differ only in variance function. (An exception to this is 
the VAR model, since the parameters A carry a subtly different meaning, 
which under a Bayesian formulation leads to a translation of the prior dis- 
tribution and in the information criteria case changes the definition of the 
predictor set.) 

Despite the presence of nonlinearities and memory in the cellular drift f , 
neither the use of quadratic basis functions nor the inclusion of lagged predic- 
tors appear to improve performance in terms of AUR. In order to verify that 
quadratic predictors are sufficiently nonlinear and that lagged predictors are 
sufficiently delayed, we repeated the investigation using both cubic predic- 
tors and using a delay twice as long. Results (SFigures 3 and 4) demonstrate 
that no improvement to the AUR scores is achieved in this way. 

Corresponding results for the Swat model are shown in Figure 2. Here we 
find that none of the methods perform well. 

We also performed inference using biochemical data from the experimental 
system reported in Cantone et al. (2009) (specifically the "switch-on" data 
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Fig. 2. An empirical comparison of network inference schemes. Simulated experiments 
based on published dynamical systems allow benchmarking of performance in terms of area 
under ROC curves (AUR; higher scores correspond to better network inference perfor- 
mance), (a) Even sampling intervals, (h) Uneven sampling intervals. 
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set therein). AUR scores obtained using this data (SFigure 5) were in close 
agreement with those obtained using synthetic data [Figure 2(a)], suggesting 
that the results of the simulations are relevant to real world studies. 

3.2.2. Uneven sampling intervals. Many biological time-course experi- 
ments are carried out with uneven sampling intervals. We therefore repeated 
the analysis above with sampling times of 0, 1, 5, 10, 15, 20, 30, 40, 50, 60, 75, 
90, 105, 120, 140, 160, 180, 210, 240 and 280 minutes. Figure 2(b) displays 
the AUR scores so obtained. We find that all the methods perform worse in 
the uneven sampling regime, with no method performing significantly bet- 
ter than random. Corresponding results for the Swat model are shown in 
SFigure 7. Again, here we find that none of the methods perform well. 

3.2.3. Consistency. Figure 3 displays AUR scores for Cantone for a large 
number of evenly sampled time points (n = 100), and the limiting case of zero 
measurement noise and zero cellular heterogeneity (cxmeas = 0, cJceii = 0, even 
sampling intervals). Consistency (in the sense of asymptotic convergence of 
the network estimate to the data-generating network) may be unattainable 
due to the nonidentifiability resulting from limited exploration of the dy- 
namical phase space. This lack of subjectivity means that in many cases 
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Fig. 3. Investigation of empirical consistency of network estimators, using the Cantone 
et al. (2009) model with even sampling intervals. Area under ROC curves are shown in 
the large data set, zero cellular heterogeneity and zero measurement noise limits. 
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inference cannot possibly reveal the full data-generating graph, although, 
as we have seen, network inference can nonetheless be informative. From 
Figure 3 we see that the Bayesian schemes using linear predictors approach 
AUR equal to unity, and in this sense show empirical consistency with re- 
spect to network inference. However, some of the other methods do not 
converge to the correct graph even in this limit. 

4. Discussion. The analyses presented here were aimed at better under- 
standing statistical network inference for biological applications. We showed 
how a broad class of approaches, including VAR models, linear DBNs and 
certain ODE-based approaches, are related to stochastic dynamics at the 
cellular level. We discuss a number of these aspects below and close with 
some views on future perspectives for network inference, including recom- 
mendations for practitioners. 

4.1. Time intervals. We found that uneven sampling intervals posed 
problems, even for methods that explicitly accounted for the sampling in- 
terval. Further insight may be gained from a "propagation of uncertainty" 
analysis of the approximations indicated in Section 2.2. Assuming the true 
large sample process obeys dX.°°/dt = F(X°°), we have that under an ob- 
servation process with independent additive Gaussian measurement error 
Y(t) ~ A^(X°°(t),M) an expansion for the variance Y{dY - F(Y)) over 
a time interval A is given by 

(17) MA-2 + (IA~i + L»F)M(IA-^ + DF)' + ■■■ 

(see SI for details). Recall that the model family in equation (10) approx- 
imates this variance by h{A)T>{al,...,al,), where /i(A) = A~". From this 
perspective it is clear that each variance function we considered captures 
only partial variation due to A. It is therefore not surprising that perfor- 
mance suffers in the uneven sampling regime, which requires the variance 
function to apply equally to large A as to small A. Moreover, a natural 
choice of variance function driven by equation (17) is not possible, since 
this would require knowledge of the unknown process F. The implication 
for experimental design is that, absent specific reasons for uneven sampling, 
it may be preferable to collect data at regular intervals. 

Figure 4 displays an approximation to the true variance function for the 
Cantone model (see SI). Observe that for small sampling intervals A the 
true curvature is best captured by a functional approximation of the form 
h{A) (X A~" with a = 1, 2, whereas for intervals larger than 10 mins (which 
are more common in practice) the flat approximation h{A) cx 1 correctly 
captures the asymptotic behavior. In applications where high frequency sam- 
pling is infeasible, the flat variance function might be a sensible choice. To 
understand whether difficulties related to sampling intervals disappear in 
the large sample limit, we repeated the empirical consistency analysis under 
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functional form for Cantone et al. (2009). For small time steps a power law A^™ provides 
a good approximation, hut for larger time steps a constant variance function may he more 
appropriate. In practice, the precise form of htme will he unknown. 

uneven sampling (SFigures 11 and 12). Interestingly, we found that none of 
the methods appeared to be empirically consistent, and that the choice of 
variance function is influential. However, unevenly sampled data are com- 
mon in biology and it may be the case that in some settings, the existence of 
multiple time scales (e.g., signaling, transcription, accumulating epigenetic 
alterations) mean that unevenly sampled data are nonetheless useful. Our 
findings suggest that care should be taken in the uneven sampling regime. 

4.2. Interventional data. The Cantone data are favorable in the sense 
that gene profiles show interesting time-varying behavior under global per- 
turbation, exploring a large proportion of the dynamical phase space. How- 
ever, such behavior is dependent on the specific dynamical system and is not 
displayed by the Swat model, which has a much larger phase space, being 
a nine-dimensional dynamical system. This may help explain the poor per- 
formance of all the methods on this latter model using global perturbation 
data and perhaps reinforces the intuitive notion that dynamics that are fa- 
vorable (in this informal sense) facilitate network inference. In some cases, 
perturbation data are available in which individual variables are inhibited 
(e.g., by RNA interference, gene knockouts or inhibitor treatments). Such 
data have the potential to explore much more of the dynamical phase space, 
including regions which cannot be accessed without direct inhibition of spe- 
cific molecular components. This is an important consideration because the 
statistical estimators described in Section 2.4 take the form 



(18) 



A = (Df (J-x))xe7e 
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where the average is over the region 7^ C ^ in state space visited during the 
experiments. Clearly, if the region (f(J^7^),7^) is only a small subspace of 
phase space, then the estimate equation (18) will be poor compared to one 
based on the entire phase space A* = {D{{J^x))x£X ■ 

To investigate the added value of interventional treatments for network 
inference, we repeated both the Cantone and Swat analyses with an ensem- 
ble of data sets obtained by inhibiting each variable in turn; this gave 5 and 
9 data sets for Cantone and Swat respectively. While no improvement to the 
Cantone AUR scores was observed (SFigure 15), there was improved per- 
formance for Swat (SFigure 16). This suggests that global perturbations are 
insufficient to explore the Swat dynamical phase space, and supports the 
intuitive notion that intervention experiments may be essential for infer- 
ence regarding larger dynamical systems. Nevertheless, AUR scores remain 
far from unity. This may be because the Swat drift function contains com- 
plex interaction terms which single interventions alone fail to elucidate. An 
important problem in experimental design will be to estimate how much 
(possibly combinatorial) intervention is required to achieve a certain level of 
network inference performance. 

We considered precise artificial intervention of single components in silico. 
However, biological interventions may be imprecise and imperfect. For ex- 
ample, RNA interference achieves only incomplete silencing of the target and 
small molecule inhibitors may have off-target effects. Moreover, at present 
such interventions are not instantaneous nor truly exogenous. This means 
that in many cases the system itself may be changed by the intervention, 
rendering resulting predictions inaccurate for the native system of interest. 
There remains a need for novel statistical methodology capable of analyzing 
time-course data under biological interventions. Existing literature in causal 
inference [Pearl (2009)] and related work in graphical models [Eaton and 
Murphy (2007)] are relevant, but in biological applications it may also be 
important to consider the mechanism of action of specific interventions. 

4.3. Nonlinear models. We focused on linear statistical models. Clearly, 
linear models are inadequate in many cases. For example, Rogers, Khanin 
and Girolami (2007) demonstrate the benefit of a nonlinear model based 
on Michaelis-Menten chemical kinetics for inference of transcription factor 
activity. However, network inference based on nonlinear ODEs remains chal- 
lenging [Xu et al. (2010)]. Alternatively, Aijo and Lahdesmaki (2009) con- 
sider the use of a nonparametric Gaussian process (CP) interaction term 
in the regression, which is naturally more flexible than linear regression us- 
ing finitely many basis functions. This may help to overcome the linearity 
restriction, but introduces additional degrees of freedom, including the CP 
covariance function and associated hyperparameters. While a thorough com- 
parison of such approaches was beyond the scope of this article, the potential 
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utility of nonparametric interaction terms is worthy of investigation. In this 
study we observed that neither the use of predictor products nor lagged 
predictors led to improved performance; this may reflect nontrivial coupling 
between cellular dynamics and the observed data. 

4.4. Single-cell data. In the future it may become possible to measure 
single-cell expression levels X'^ nondestructively (e.g., by live cell imag- 
ing), producing truly longitudinal data sets. It is interesting to consider 
how such data may impact upon the performance of regression-based net- 
work inference. Under independent additive Gaussian measurement error 
Y(t) ~ A^(X'^(t), M) an expansion for the single-cell variance V(dY — f) 
over a time interval A, in analogy with equation (17), is given by 



(see SI). Thus, a (single) longitudinal single-cell data set contains less infor- 
mation about the drift f than aggregate data [equation (17)] due to cellular 
stochasticity g. However, multiple longitudinal data sets may jointly contain 
more information than a single aggregate data set. To empirically test the 
utility of such data, we carried out network inference using 10 such longitu- 
dinal single-cell data sets on both the Cantone and Swat models, observed 
at even intervals with the same magnitude of measurement error as aggre- 
gate data. Results (SFigures 13 andl4) show a small improvement to the 
mean AUR scores, but reduction by a factor of about two in the variance 
of these scores (compared with the corresponding nonlongitudinal data), 
implying that the network estimators may be converging to an incorrect 
network. Bias may occur when the cellular drift f is not well approximated 
by a linear function, as is the case for the Swat model. Consider the ide- 
alized scenario where f = f (X) is Markovian and it is possible to observe 
longitudinal, single-cell expression levels. Under these apparently favorable 
circumstances even estimators obtained after a thorough exploration of state 
space may not offer good approximations, that is. A* ^ Df |x=o- As a toy 
example consider the cellular drift 



which is not well approximated by a linear function over the state space 
X = [0, 1]^. In this case averaging leads to cancellation 



(19) 



+ (IA~i + L>F)M(IA~^ + DF)' + A^^gg' + • • • 
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so that no interactions are inferred. Under such circumstances network in- 
ference is no longer possible using the naive linear regression approach. This 
suggests that network inference rooted in nonlinear models may be needed 
to fully exploit longitudinal single-cell data in the future. A related line of 
work addresses heterogeneity of the drift function in time by coupling DBNs 
with change point processes [Grzegorczyk and Husmeier (2011); Kolar, Song 
and Xing (2009); Lebre et al. (2010)]. A promising direction would be piece- 
wise linear regression modeling for network inference applications, where the 
heterogeneity appears in the spatial domain. 

4.5. High- dimensions and missing variables. We focused on the simplest 
possible case of fully observed, low-dimensional systems. There is a rich lit- 
erature in high-dimensional variable selection and related graphical models 
[Meinshausen and Biihlmann (2006); Hans, Dobra and West (2007); Fried- 
man, Hastie and Tibshirani (2008)] which applies equally to the regression 
models described here. The issues raised in this paper remain relevant in 
the high-dimensional setting. However, in practice, even high-dimensional 
observations are likely to be incomplete, since it is not currently possible 
to measure all relevant chemical species. Therefore, inferred relationships 
between variables may be indirect. This may be acceptable for the purpose 
of predicting the outcome of biochemical interventions (e.g., inhibiting gene 
or protein nodes), but limits stronger causal or mechanistic interpretations. 
Latent variable approaches are available [Beal et al. (2005)], but model se- 
lection can be challenging and remains an open area of research [Knowles 
and Ghahramani (2011)]. We note also that the missing variable issue for 
biological networks is arguably more severe than in, say, economics or epi- 
demiology, insofar as measured variables may represent only a small fraction 
of the true state vector, often with little specific insight available into the 
nature of the missing variables or their relationship to observations. Further 
work is required to better understand these issues in the context of inference 
for biological networks. 

4.6. Future perspectives. We found that a simple linear model could suc- 
cessfully infer network structure using globally perturbed time-course data 
from the Cantone system. It is encouraging that inference based only on as- 
sociations between variables, none of which were explicitly intervened upon, 
can in some cases be effective. Interventional designs should further enhance 
prospects for network inference. On the other hand, theoretical arguments, 
and the results we showed from the Swat system, emphasize that in some 
cases network structure may not be identifiable, even at the coarse level re- 
quired for qualitative biological prediction. On balance, we believe that net- 
work inference can be useful in generating biological hypotheses and guiding 
further experiment. However, the concerns we raise motivate a need for cau- 
tion in statistical analysis and interpretation of results. At the present time, 
we do not believe network inference should be treated as a routine analysis 
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in bioinformatics applications, but rather as an open research area that may, 
in the future, yield standard experimental and statistical protocols. 

Some specific recommendations that arise from the results presented here 
are as follows: 

• A default model. Our results suggest that a reasonable default choice of 
model for typical applications uses the standard design matrix with no 
lagged predictors and a flat variance function, corresponding to the linear 
model 

(22) dY{t,) ~ N{AY{tj_i),V{al . . .,a%)). 

Coupled with the Bayesian variable selection scheme outlined in Sec- 
tion 2.4.1, this simple model produced empirically consistent network es- 
timators for Cantone using evenly sampled global perturbation data (Fig- 
ure 3). 

• Diagnostics and validation. It is clear that network inference does not 
enjoy general theoretical guarantees and that the ability to successfully 
elucidate network structure depends on details of the specific system un- 
der study. Therefore, careful empirical validation on a case-by-case basis 
is essential. This should include statistical assessment of model fit, ro- 
bustness and predictive ability and, where possible, systematic validation 
using independent interventional data. 

• Experimental design. We suggest sampling evenly in time as a default 
choice. Interventional designs may be helpful to effectively explore larger 
dynamical phase spaces. However, to control the burden of experimentally 
exploring multiple time points, molecular species, interventions, culture 
conditions and biological samples, adaptive designs that prune experi- 
ments based on informativeness for the specific biological setting may be 
helpful [Xu et al. (2010)]. 

In conclusion, linear statistical models for networks are closely related 
to models of cellular dynamics and can shed light on patterns of biochem- 
ical regulation. However, biological network inference remains profoundly 
challenging, and in some cases may not be possible even in principle. Nev- 
ertheless, studies aimed at elucidating networks from high-throughput data 
are now commonplace and play a prominent role in biology. For this reason 
there remains an urgent need for both new methodology and theoretical 
and empirical investigation of existing approaches. Furthermore, there re- 
main many open questions in experimental design and analysis of designed 
experiments in this setting. 
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SUPPLEMENTARY MATERIAL 

Additional materials (DOL 10. 1214/11- AOAS532SUPP; .zip). This sup- 
plement provides the dynamical systems used in this paper and accompany- 
ing MATLAB R2010a scripts, derivations and additional figures SFigures 1- 
16. 
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